Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][apps] Implemented the socket close reason feature #2747

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ethouris
Copy link
Collaborator

@ethouris ethouris commented Jun 7, 2023

POC for the "close reason" feature.

Close reason is a code that is set to record, who was responsible for breaking the connection and closing the socket.

This reason is "one shot", as it's always set to the unset value all the time and also during connection, until the connection was broken or the socket was closed. As many things like this may happen even multiple times on one socket, only first such activity counts, and every next one will not override a once set value.

In general, this code can be set from the following sources:

  • the socket closed by the application, while it was connected
  • receiver buffer overflow in live mode
  • sequence discrepancy or various protocol errors
  • received UMSG_SHUTDOWN message from the peer

If the connection has been broken because of a received UMSG_SHUTDOWN message, it should also contain the reason code in this message. If this succeeds, this is recorded on the side that received this message as a peer code and the agent will contain the "peer" redirection information. If it was the agent that has first broken the connection or closed the socket, the appropriate code will be in the agent code location and the peer code is unset.

API changes:

  • The SRT_CLOSE_INFO structure provides the close reason code information:
    • agent: code for agent, if it was agent to break the socket, otherwise it contains SRT_CLS_PEER.
    • peer: code for peer extracted from UMSG_SHUTDOWN message, or SRT_CLS_FALLBACK if the peer doesn't support it
    • time: time when it happened, in the same convention as srt_time_now().
  • Use srt_close_getreason(socket, close_info_ptr) to retrieve the close reason for a socket. Note that it doesn't matter if the socket was called srt_close() before this call, although you should not call it unless you know that there was a transmission error on a socket or at least that the connection was broken.
  • Use srt_close_withreason(socket, reason_code) to close the socket and set the user reason. Note that only values of SRT_CLSC_USER or greater are accepted, otherwise it behaves just like srt_close().
  • The enum type SRT_CLOSE_REASON is filled with reason codes. The close reason codes are designed to match the rejection codes at least in some common part, but only about a half of them is similar in both.

Protocol changes:

The only protocol change is that the UMSG_SHUTDOWN message now carries additional data, which is only the 32-bit rejection code value. Note that all messages that didn't carry extra data were filled by zero-padding for 4 bytes, as the UDT author explained, due to the requirement of writev (and likely later sendmsg) that didn't accept zero-length block declarations in multi-block calls. This way, so far UMSG_SHUTDOWN was actually sending a 4-byte body filled with value 0. This matches the SRT_CLS_UNKNOWN code. So, for compatibility:

  • old peer sending UMSG_SHUTDOWN to the new peer sends the SRT_CLS_UNKNOWN value, which is then fixed into SRT_CLS_FALLBACK, which means that it was shut down by the other party, but that party doesn't support the feature.
  • new peer sending UMSG_SHUTDOWN to the old peer is not a problem because older versions do not read any body data from this message.

Nuisances:

  1. The information is either retrieved from the socket or from a temporary database that stores this information so that it can be retrieved even if the socket in question was already physically deleted. Currently this information is set to stay for 10 GC cycles and only up to 10 newest records are kept there.
  2. If a particular party was responsible for breaking the connection, it doesn't necessarily mean that its peer will contain .agent == SRT_CLS_PEER and this party's reason code in .peer. It may always happen that both sides break the connection at the same time independently and in this case both sides could show as if they were responsible. This is also the reason why this close reason field cannot be just one for all purposes, or why the rejection reason functionality cannot be at least partially reused.
  3. The UMSG_SHUTDOWN message is volatile and not under journal protection (as it's with handshake or ACK). If the UDP packet carrying this message was lost in the network, the party that should receive it will not learn about it and likely in this case it will have .agent == SRT_CLS_PEERIDLE. This means that even if there is a message passing possible in order to keep the blame on the peer, this is not guaranteed to work in every case.

The recommendation for applications: always extract the close reason code and provide the user with this information, and the users should retrieve them on both connection parties to ensure the maximum of retrievable information. This information is highly volatile and there might potentially exist multiple reasons why the socket was closed or the connection was broken, and this functionality provides the very first of them.

The testing application srt-test-live is armed with displaying this code and it also uses two user codes: 0 (transmission interrupted) and 1 (configuration error).

@maxsharabayko
Copy link
Collaborator

Please add the following description:

  • What API changes are proposed? srt_close_withreason, srt_close_getreason, enum SRT_CLOSE_REASON, SRT_CLOSE_INFO
  • What protocol changes are proposed?
  • Why would not you reuse and extend existing rejection reason codes and API functions (srt_getrejectreason)?

FR #2638.

@maxsharabayko maxsharabayko added this to the v1.6.0 milestone Jun 7, 2023
@maxsharabayko maxsharabayko added Type: Enhancement Indicates new feature requests [core] Area: Changes in SRT library core [API] Area: Changes in SRT library API labels Jun 7, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #2747 (4b8df2c) into master (9448e26) will decrease coverage by 0.09%.
The diff coverage is 59.34%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master    #2747      +/-   ##
==========================================
- Coverage   67.09%   67.01%   -0.09%     
==========================================
  Files          99       99              
  Lines       20165    20266     +101     
==========================================
+ Hits        13530    13581      +51     
- Misses       6635     6685      +50     
Impacted Files Coverage Δ
srtcore/core.h 70.37% <0.00%> (-1.79%) ⬇️
srtcore/srt.h 100.00% <ø> (ø)
srtcore/srt_c_api.cpp 43.52% <7.69%> (-2.89%) ⬇️
srtcore/group.cpp 37.74% <33.33%> (-0.03%) ⬇️
srtcore/core.cpp 61.29% <57.14%> (-0.15%) ⬇️
srtcore/api.cpp 53.54% <70.96%> (+0.28%) ⬆️
srtcore/api.h 96.66% <100.00%> (+0.11%) ⬆️
srtcore/atomic_clock.h 100.00% <100.00%> (ø)
srtcore/packet.cpp 45.87% <100.00%> (ø)
srtcore/queue.cpp 80.76% <100.00%> (+1.20%) ⬆️

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ethouris ethouris changed the title [core][apps] Implemented the socket close reason feautre [core][apps] Implemented the socket close reason feature Jun 7, 2023
@maxsharabayko maxsharabayko modified the milestones: v1.5.4, v1.6.0 Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[API] Area: Changes in SRT library API [core] Area: Changes in SRT library core Type: Enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants